Mutual information in random Boolean models of regulatory networks 
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The amount of mutual information contained in time series of two elements gives a measure of 
how well their activities are coordinated. In a large, complex network of interacting elements, such 
as a genetic regulatory network within a cell, the average of the mutual information over all pairs 
(7) is a global measure of how well the system can coordinate its internal dynamics. We study 
this average pairwise mutual information in random Boolean networks (RBNs) as a function of the 
distribution of Boolean rules implemented at each element, assuming that the links in the network 
are randomly placed. Efficient numerical methods for calculating (7) show that as the number of 
network nodes N approaches infinity, the quantity A'^(7) exhibits a discontinuity at parameter values 
corresponding to critical RBNs. For finite systems it peaks near the critical value, but slightly in 
the disordered regime for typical parameter variations. The source of high values of A''{7) is the 
indirect correlations between pairs of elements from different long chains with a common starting 
point. The contribution from pairs that are directly linked approaches zero for critical networks and 
peaks deep in the disordered regime. 

PACS numbers: 87.10.+e, 89.75.Fb, 02.50.Ng 



I. INTRODUCTION 

The dynamical behavior of a large, complex network of 
interacting elements is generally quite difficult to under- 
stand in detail. One often has only partial information 
about the interactions involved and the presence of mul- 
tiple influences on each element can give rise to exceed- 
ingly complicated dynamics even in fully deterministic 
systems. A paradigmatic case is the network of genes 
within a cell, where the interactions correspond to tran- 
scriptional (and post-transcriptional) regulatory mecha- 
nisms. The expression of a single gene may be subject 
to regulation by itself and up to 15-20 proteins derived 
from other genes, and the network of such interactions 
has a complicated structure, including positive and nega- 
tive feedback loops and nontrivial combinatorial logic. In 
this paper we study mutual information (defined below) 
a measure of the overall level of coordination achieved in 
models of complex regulatory networks. We find a sur- 
prising discontinuity in this measure for infinite systems 
as parameters are varied. We also provide heuristic ex- 
planations of the infinite system results, the influence of 
noise, and finite size effects. 

The theory of the dynamics of such complicated net- 
works begins with the study of the simplest model sys- 
tems rich enough to exhibit complex behaviors: Random 
Boolean Networks (RBNs). In a RBN model, each gene 
(or "node" ) 5 is represented as a Boolean logic gate that 
receives inputs from some number kg other genes. The 
RBN model takes the network to be drawn randomly 
from an ensemble of networks in which (i) the inputs 
to each gene are chosen at random from among all of 
the genes in the system; and (ii) the Boolean rule at g 
is selected at random from a specified distribution over 
all possible Boolean rules with kg inputs. These two as- 



sumptions of randomness permit analytical insights into 
the typical behavior of a large network. 

One important feature of RBNs is that their dynamics 
can be classified as ordered, disordered, or critical. In 
"ordered" RBNs, the fraction of genes that remain dy- 
namical after a transient period vanishes like 1/N as the 
system size N goes to infinity; almost all of the nodes 
become "frozen" on an output value (0 or 1) that does 
not depend on the initial state of the network In 
this regime the system is strongly stable against tran- 
sient perturbations of individual nodes. In "disordered" 
(or "chaotic") RBNs, the number of dynamical, or "un- 
frozen" nodes scales like N and the system is unstable to 
many transient perturbations [f| . 

For present purposes, we consider ensembles of RBNs 
parametrized by the average indegree K (i.e., average 
number of inputs to the nodes in the network), and the 
bias p in the choice of Boolean rules. The indegree dis- 
tribution is Poissonian with mean K and at each node 
the rule is constructed by assigning the output for each 
possible set of input values to be 1 with probability p, 
with each set treated independently, li p = 0.5, the rule 
distribution is said to be unbiased. For a given bias, the 
critical connectivity, Kc, is equal to 0: 

^ [2pil - p)]-\ (1) 

For K < Kc the ensemble of RBNs is in the ordered 
regime; for K > Kc, the disordered regime. For K — 
Kc, the ensemble exhibits critical scaling of the number 
of unfrozen nodes; e.g., the number of unfrozen nodes 
scales like N^^^. The order-disorder transition in RBNs 
has been characterized by several quantities, including 
fractions of unfrozen nodes, convergence or divergence in 
state space, and attractor lengths 0]. 

It is an attractive hypothesis that cell genetic regula- 
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tory networks are critical or perhaps slightly in the or- 
dered regime 0, [1] ■ Critical networks display an intrigu- 
ing balance between robust behavior in the presence of 
random perturbations and flexible switching induced by 
carefully targeted perturbations. That is, a typical at- 
tractor of a critical RBN is stable under the vast majority 
of small, transient perturbations (flipping one gene to the 
"wrong" state and then allowing the dynamics to proceed 
as usual), but there are a few special perturbations that 
can lead to a transition to a different attractor. This ob- 
servation forms the conceptual basis for thinking of cell 
types as attractors of critical networks, since cell types 
are both homeostatic in general and capable of differenti- 
ating when specific signals (perturbations) are delivered. 

Recently, some experimental evidence has been shown 
to support the idea that genetic regulatory networks in 
eukaryotic cells are dynamically critical. In Ref. [6], the 
microarray patterns of gene activities of HeLa cells were 
analyzed, and the trajectories in a HeLa microarray time- 
series data characterized using a Lempel-Ziv complex- 
ity measure on binarized data. The conclusion was that 
cells are either ordered or critical, not disordered. In 
Ref. 01 , it was deduced that deletion of genes in criti- 
cal networks should yield a power law distribution of the 
number of genes that alter their activities with an expo- 
nent of —1.5 and observed data on 240 deletion mutants 
in yeast showed this same exponent. And in Ref. sj, 
micro-array gene expression data following silencing of a 
single gene in yeast was analyzed. Again, the data sug- 
gests critical dynamics for the gene regulatory network. 
These results suggest that operation at or near criticality 
confers some evolutionary advantage. 

In this paper we consider a feature that quantifies 
the sense in which critical networks are optimal choices 
within the class of synchronously updated RBNs. We 
study a global measure of the propagation of information 
in the network, the average pairwise mutual information, 
and show that it takes its optimal value on the ensemble 
of critical networks. Thus, within the limits of the RBN 
assumptions of random structure and logic, the critical 
networks enable information to be transmitted most ef- 
ficiently to the greatest number of network elements. 

The average pairwise mutual information is defined as 
follows. Let Sa be a process that generates a with 
probability po and a 1 with probability pi . We define the 
entropy of Sq as 

iJ[sa] = -pologaPo - Pi log2 Pi- (2) 

Similarly, for a process Sab that generates pairs xy with 
probabilities Pxy, where x,y £ {0, 1}, we define the joint 
entropy as 

H[sab] = -poo log2 Poo - Poi loga Poi 

-piologjPio -piilogaPii. (3) 

For a particular RBN, we imagine running the dynam- 
ics for infinitely long times and starting from all possi- 
ble initial configurations. The fraction of time steps for 



which the value of node i is x gives px for the process 
Si. The value oi Pxy for the process Sij is given by the 
fraction of time steps for which node i has the value x 
and on the next time step node j has the value y. The 
mutual information of the pair ij is 

M,j=H[s,]+H[sj]-H[s,j]. (4) 

With this definition, Mij measures the extent to which 
information about node i at time t infiuences node j one 
time step later. Note that the propagation may be in- 
direct; a nonzero Mij can result when i is not an input 
to J but both are influenced by a common node through 
previous time steps. 

To quantify the efficiency of information propagation 
through the entire network, we define the average pair- 
wise mutual information for an ensemble of networks to 
be 

(/) = /7V-2^Af,A, (5) 

where (•) indicates an average over members of the en- 
semble. It has previously been observed that (/) is maxi- 
mized near the critical regime in numerical simulations of 
random Boolean networks with a small number of nodes 
(less than 500) 0. 

In general, one does not expect a given element to be 
strongly correlated with more than a few other elements 
in the network, so the number of pairs ij that contribute 
significantly to the sum in Eq. ([5]) is expected to be at 
most of order N . It is therefore convenient to work with 
the quantity Xat = N{I), which may approach a nonzero 
constant in the large N limit. We use the symbol Too to 
denote the N ^ oo limit oUn- 

As an aside, we note that other authors have con- 
sidered different information measures and found opti- 
mal behavior for critical Boolean networks. Krawitz and 
Shmulevich have found that "basin entropy," which char- 
acterizes the number and sizes of basins of attraction and 
hence the ability of the system to respond differently 
to different inputs, is maximized for critical networks 
[Tot . Luque and Ferrera have studied the self-overlap [ll[ , 
which differs from (/) in that it involves comparison of 
each node to its own state one time step later (not to the 
state of another node that might be causally connected 
to it) and that the average over the network is done be- 
fore calculating the mutual information. Bertschinger 
and Natschlager have introduced the "network-mediated 
separation" (A'^M-separation) in systems where all nodes 
are driven by a common input signal, finding that critical 
networks provide maximal A^Af-separation for different 
input signals [12]. Our definition of (/) places the focus 
on the autonomous, internal dynamics of the network and 
the transmission of information along links, which allows 
for additional insights into the information fiow. 

Two simple arguments immediately show that Too is 
zero both in the ordered regime and deep in the disor- 
dered regime. First, note that My = whenever Si or 
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Sj generates only Os or only Is. In the ordered regime, 
where almost all nodes remain frozen on the same value 
on all attractors, the number of nonzero elements Mij 
remains bounded for large N. Thus (/) must be of order 
N"'^ and Too — everywhere in the ordered regime. 

Second, if Sy is the product of two independent pro- 
cesses Si and Sj, then My = 0. This occurs for every pair 
of connected nodes in the limit of strong disorder, where 
K is very large and the Boolean rules are drawn from 
uniformly weighted distributions over all possible rules 
with kg inputs. The correlation between the output of a 
node and any particular one of its inputs becomes van- 
ishingly small because there are many combinations of 
the other inputs, each producing a randomly determined 
output value, so the probability for the output to be 1 is 
close to p for either value of the given input. Too therefore 
vanishes in the limit of large K. 

Given that Too — for all network parameters that 
yield ordered ensembles, one might expect that it rises to 
a maximum somewhere in the disordered regime before 
decaying back to zero in the strong disorder limit. We 
show below that this is not the case. Fixing the bias 
parameter p at 1/2 and allowing the average indegree K 
to vary, we find that Too exhibits a jump discontinuity 
at the critical value K = 2, then decays monotonically 
to zero as K is increased. The conclusion is that among 
ensembles of unbiased RBNs, average pairwise mutual 
information is maximized for critical ensembles. 

We begin by presenting analytic arguments and nu- 
merical methods for investigating the large system limit 
and establishing the existence of a discontinuity at K — 2 
and monotonic decay for K > 2. We then present results 
from numerical experiments obtained by averaging over 
10'' instances of networks of sizes up to = 1000. These 
data show a strong peak near the critical value K = 2, 
as expected from the analysis. Interestingly, the peak is 
substantially higher than the size of the jump discontinu- 
ity, which may indicate that Too for if = 2 is an isolated 
point larger than lmij^_,2+ loo ■ Finally, we present nu- 
merical results on the variation of Tn with p at fixed K , 
which again shows a peak for critical parameter values. 



II. AVERAGE PAIRWISE MUTUAL 
INFORMATION IN LARGE NETWORKS 

A. Mean-field calculation of Too 

Mean-field calculations are commonly used in the the- 
ory of random Boolean networks. The most common 
forms of mean-field calculations are within the realm of 
the so called annealed approximation. In the annealed 
approximation, one assumes that the rules and the in- 
puts are randomized at each time step. This approach is 
sufficient, for example, for calculating the average num- 
ber of nodes that change value at each time step. 

For understanding the propagation of information, a 
slightly more elaborate mean-field model is needed. This 



mean-field model is based on the assumption that the 
state of a node in a large disordered network is inde- 
pendent of its state at the previous time step, but that 
its rule remains fixed. In this model, each node takes 
the value 1 with a given probability 6, which we refer 
to as the local bias. In the annealed approximation all 
local biases are equal because the rules and the inputs 
are redrawn randomly at each time step, so the system 
is characterized by a single global bias. In our extended 
mean-field model, we consider a distribution of local bi- 
ases. To determine loo, we determine the distribution of 
6, then use it to analyze the simple feed forward struc- 
tures that provide the nonvanishing contributions to Xqo 
in the disordered regime. 



B. The distribution of local biases 

An important feature characterizing the propagation 
of information in a network is the distribution of local 
biases. The local bias at a given node is determined by 
the rule at that node and the local biases of its inputs. 
Roughly speaking, when the bias of the output value is 
stronger than the bias of the inputs, information is lost in 
transmission through the node. The local bias distribu- 
tion is defined as the self-consistent distribution obtained 
as the limit of a convergent iterative process. 

Let Bt be the stochastic function that at each evalua- 
tion returns a sample b from the local bias distribution 
at time t. Then, a sample h' from Bt+i can be obtained 
as follows. Let r be a Boolean rule drawn from the net- 
work's rule distribution R and let k denote the number 
of inputs to r. Furthermore, let . . . ,6fc} be a set of 
k independent samples from Bt- The sample h' is then 
given by 

k 

b'= r{a)l[[aA + {l-<7^)il-b,)] ■ (6) 

CTe{o,i}'= i=i 

Repeated sampling of the rule r and the values bi pro- 
duces samples b' that define the distribution Bt+i- The 
sequence of distributions Bq, Bi, B2, ■ ■ ■ is initiated by Bq 
that always returns 1/2. 

For many rule distributions R, P{Bt < x) converges 
as t — > 00. For such rule distributions, we define B* as 
the stochastic function that satisfies 

P{B* <x)^ lim P{Bt < x) (7) 

t — ^00 

for all X. Intuitively, B* is the large t limit of Bt and we 
use cumulative probabilities in the definition of B* for 
technical reasons: the probability density function of Bt 
for any t is a sum of delta functions and the probability 
density of i3* is likely to have singularities. 

We defer the evaluation of B* to Section fll Dl because 
some adjustments are required to obtain efficient numer- 
ical procedures. 



C. Mutual information in feed- forward structures 

Given a rule distribution R that has a weU-defined dis- 
tribution B* of local biases, we calculate the mutual in- 
formation between pairs of nodes in feed-forward struc- 
tures that are relevant in the large network limit in the 
disordered regime. This technique is based on the as- 
sumption that the value of a given node at time t + n is 
statistically independent of the value at time t for n ^ 0, 
in which case the behavior of the inputs to a feed-forward 
structure can be fully understood in terms of B* . 

The most direct contribution to (/) between t and t + 1 
comes from comparing an input to a node with the out- 
put of the same node. Other contributions to (/) come 
from chains of nodes that share a common starting point. 
(See Fig. [1]) In the general case, we consider configura- 
tions where a node io has outputs to a chain of n nodes 
ii, . . . ,i„ and another chain of rt + 1 nodes ji, . . . , jn+i. 
This means that node im has one input from node im-i 
for m = 1, . . . , n, that node ji has one input from node 
io, and that node jm has one input from node jm-i for 
m — 2, . . . ,n + 1. We allow the special case n = and 
let it represent the case where an input to a node is com- 
pared to the output of the same node. 

To calculate the contribution Mi„j„^^ to (/), we need 
to determine the probability distribution p^y for x = 
<Ji^{t) and y = crj^_^_-^{t + 1). Here we assume that all 
external inputs to the feed-forward structure are statisti- 
cally independent because the probability to find a recon- 
nection between two paths of limited length approaches 
zero as ^ oo. In Fig. [U this means that there are 
no (undirected) paths linking any of the pictured nodes 
other than those formed by the pictured links. 

Based on each rule in the structure and the biases at 
the external inputs, we calculate the conditional prob- 
abilities to obtain given output values for each value of 
the internal input within the structure. We represent this 
information by matrices of the form 



P(/3 I a) 



P{f3 



I a 

1 I a 



0) P(/3: 
0) F(/3: 



where a and (3 are Boolean variables. Let 

T„ = P(fT,,„ (<) I [t - 1)) for m = 1, , 

T;=P(a,,(t)|a,Ji-l)), and 
T™ - P(fT,,„ (t) I {t - 1)) for m = 2, , 



(8) 



,n, (9) 
(10) 

,n + l. 
(11) 



Note that the elements in each of these matrices depend 
on the rule chosen at the node index in the first argument 
of P and that the choice of rule specifies the number k 
of inputs to that node. Multiplication of these matrices 
corresponds to following a signal that passes through the 
feed-forward structure, so we have 

P(a,„ {t + n)\ a,, (t)) = T„T„_i • • • Ti (12) 




FIG. 1: Schematic structure assumed for the mean-field calcu- 
lation of Too. The average indegree of a node in the network is 
K — 3. Black nodes are an example of a directly linked pair. 
Light grey nodes are an example of a pair that contributes 
to Toe because of a shared influence (io). Information from 
io takes exactly one time step longer (one additional link) to 
get to the light grey node on the right than to the one on the 
left. The node labels mark two chains of the type referred to 
in the text. Hatching indicates frozen nodes. 



and 

P(^.„+i {t + n + l)\ a,, it)) - T;+iT; • • • T; . (13) 

The probabilities for the pairs cr^^ {t)aj^^-^ {t+1) can be 
expressed as elements of a matrix P(cri„ (i), CTj„+i (t + 1)) 
where 

p(.,,).f^;^=?'^=o) S"=?''=!iV (14) 

^ '^^ \^P(a; = 1,?; = 0) P{x = l,y=l)J ^ ' 

Note that P(a;, y) is the matrix of values Pxy defined pre- 
viously, which has a different meaning from P(/3 | a). In 
accordance with the definition of Mij above, we define 
the mutual information associated with a matrix Q with 
elements qxy to be 

m) = E ^-V l0g2 -7^ Ti^F^ V ■ (15) 

(E. Ixz) 

Now let P„ denote P(si^(t), Sj^^^^ [t -f 1)) and let 

We can then write 
P„ = T„T„_i • • • TiBo(Tl)T(T^)T • • • (T:,+i)T . (17) 
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For a given set of the indegrees ki^, . . . , ki^ and 
kj^, . . . , kj^^^ denoted by k, we let (/(P„))k denote the 
average mutual information associated with P„. Note 
that (/(Pn))k is the contribution to I^o arising from the 
average over injn+i pair of nodes in chains with a given 
k. 

The average number of occurrences of a feed-forward 
structure with the vector k is given by Nwk, where 

2n+l 
m—l 

and P{k = fc„j) denotes the probability that a randomly 
selected rule from the rule distribution R has k^ inputs. 

Putting together Eqs. ((8|)- (fT8)) . we obtain the expres- 
sion 

oo 

^oo^Y. E ^"k(/(P„))k (19) 

"=Okez^"+^ 

for rule distributions in the disordered regime. 

Numerical evaluation of this expression is cumbersome, 
but can be streamlined substantially by handling the 
frozen nodes (local biases 6 = and b = 1) analyti- 
cally. Removing these nodes from the core of the cal- 
culations provides additional insights and yields a ver- 
sion of Eq. that can be sampled more efhciently by 
Monte-Carlo techniques. 

The fraction of unfrozen nodes is given by u = P{B* ^ 
0, 1) and the distribution of local bias in the set of un- 
frozen nodes is given by B*, where one sample b of B* is 
obtained by sampling b from B* repeatedly until a value 
of 6 not equal to or 1 is obtained. Similarly, we define 
an altered rule distribution R^. A sample from R^ 
is obtained by sampling r from R and fixing each input 
to with probability P{B* — 0) and 1 with probability 
P{B* = 1). New samples r are drawn until one obtains 
a nonconstant function of the fcu inputs that are not 
frozen. (The probability of an unfrozen node having a 
given value of k^ is given below.) 

We define P" and by replacing B* and R with 
B* and R^ in the definitions of P„ and Wk, respectively. 
With these definitions, we rewrite Eq. ([19]) and get 

oo 

^oo = ^E E <(^(Pn))k- (20) 

"=o kez^"+i 

Let R{K) denote the rule distribution for a Poissonian 
distribution of indegrees k such that (k) = K and uni- 
form distributions among all Boolean rules with a given 
k. Note that the definition does not require that a rule 
does depend on all of its inputs. From this point, we 
restrict the discussion to distributions of the form R{K). 
We expect the qualitative behavior of this rule distribu- 
tion to be representative for a broad range of rule distri- 
butions. The critical point for R{K) occurs sX, K — 2, 
with ordered networks arising for K < 2 and disordered 
networks ior K > 2. 



The symmetric treatment of Os and Is in the rule dis- 
tribution simplifies the calculation of u in the sense that 
it is sufficient to keep track of the probability to obtain 
constant nodes from Bt and there is no need to distin- 
guish nodes that are constantly 1 from those that are 
constantly 0. Let ut denote P{Bt ^ {0,1}). Then, ut+i 
can be calculated from ut according to 

u.,,=e--- 5:^^(1 -2^-^). (21) 

fc=0 

The desired value u is given by the stable fixed point of 
the map Ut i— > Ut+i- Note that this map is identical to the 
damage control function presented in Ref. pL,], meaning 
that the above determination of u is consistent with the 
process of recursively identifying frozen nodes without 
the use of any mean-field assumption. 

The rule distribution Ru{K) within the set of unfrozen 
nodes gives a rule with km inputs with probability 

P{k^ ^ km) = e-^"^^^(l - 2^'--^) (22) 

The distribution of rules with k inputs is uniform among 
all nonconstant Boolean rules with the given number of 
inputs. This expression can be used in Eq. (|18p . Further 
analysis is helpful in determining the limiting value of 
loo as K approaches its critical value of 2 from above. 
Appendix \K\ addresses this issue. 



D. Numerical sampling 

We are now in a position to evaluate Eq. (PO)) by an 
efficient Monte-Carlo technique. To obtain a distribution 
that is a good approximation of -B*, we use an iterative 
process to create vectors of samples. Each vector has a 
fixed number S of samples. The process is initiated by 
a vector bo where all S nodes are set to 1/2. Then a 
sequence of vectors is created by iteratively choosing a 
vector bt+i based on the previous vector bt. To obtain 
each node in bj+i, we use Eq. ^ with r sampled from 
R'^{K) defined below and bi, . . . ,bk being randomly se- 
lected elements of bj . 

If the rule distribution R'^{K) were set to Ru{K), the 
sequence of vectors would fail to converge properly for K 
that are just slightly larger than 2. For such K, Ru{K) 
gives a 1-input rule with a probability close to 1. This 
leads to slow convergence and to a proliferation of copies 
of identical bias values in the sequence of vectors {bt}. 
The remedy for this problem is quite simple. Due to 
the symmetry between and 1 in the rule distribution, 
application of a 1-input rule to an input bias distribution 
B* gives the same output bias distribution B* . Thus, 
we can remove the 1-input rules from Ru{K) without 
altering the limiting distribution at large t. We let R'^{K) 
denote the rule distribution obtained by disregarding all 
1-input samples from Ru{K). 
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Based on {bt}, we construct matrices that can be 
used for estimating the sums in Eqs. ((^ and (|A14p 
by random sampling. After an initial number of steps 
required for convergence, we sample S matrices of the 
form P(r(<T) | cti) where r is drawn from R[^{K) and 
the inputs (T2, • ■ • , Cfc have biases drawn from bt. These 
matrices and the indegrees of the corresponding rules 
are stored in the vectors pt and kj, respectively. The 
elements of the vectors bt, pt, and kt are indexed by 
i = 1, . . . , 5 and the ith element of each vector is de- 
noted by 6t,ii Pt,i, and kt^i, respectively. For notational 
convenience, we define 

Pt,o = and kt,o = 1 (23) 

for all t — 0, 1, 2, . . .. With these definitions, the i = 
elements correspond to a copy operator. 

To estimate the sum in Eq. (|20l) . we truncate it at 
" = J^max where rimax is chosen to be sufficiently large 
for the remaining terms to be negligible. Then we ob- 
tain random samples in the following way. We select 
io uniformly from {!,..., S"} and set each of the in- 
dices ii, . . . , in„,axi ill ■ ■ • 1 *nmax+i ^o wlth probablHty 
P{ku = 1) or to a uniformly chosen sample of {1, . . . , S*} 
with probability P(fcu > !)• Then we set 

Po = (^V^" bl)(P^'n^^^ ^o = K,,- (24) 




K 



FIG. 2: The large system limit Too for N{I) (solid line) 
and the contribution to Too from direct information transfer 
through single nodes (dashed line). The empty circles at the 
discontinuity of loo indicate that we do not know the value of 
Too for K = 2. The size of the sample vectors is S = 10*. The 
number of vectors used was 10^-10* and these were drawn af- 
ter 10^ steps taken for convergence in bt. The summation 
cutoff rimax varies from 6 for high K to 100 for K close to 2. 
For the limit of Too for K 2+ using Eq. (|A14[I . increasing 
the summation cutoff from Wmax ~ 20 to Wmax ~ 40 gave no 
significant difference in the result. 



and 

= Pt,.„P^i(Pt,,„+J^ I for n = 1, . . . ,n„,ax. (25) 

Kn — Ktji^Kn— l^tj^^i J 

With /(P") given by Eq. (fTS]) . we construct samples of 
the mutual information associated with sets of nodes in 
chain structures: 

^max 

/chain = £ '««/(P") . (26) 

n=0 

The average value of /chain provides an approximation of 
the sum in Eq. ((20|) and we get 

Xoo ~ "(/chain) • (27) 

This approximation is good if t, n,„ax, and S are suffi- 
ciently large. For evaluating (/chain), we draw S samples 
of /chain for Several subsequent t that are large enough to 
ensure convergence in bt. 

The technique just described is easily generalized to ac- 
count for uncorrelated noise in the dynamics. To model 
a system in which each node has a probability e of gener- 
ating the wrong output at each time step, we need only 
modify r(cr) in Eq. §^ and P(/3 | a) of Eq. ^ as follows: 

r,{a)^{l-e)r{cT) + e[l~r{cT)] (28) 



and 

P.(/3|«)= (^;' ll,)p(/3|«), (29) 

where P is determined as above by the Boolean rule at 
a given node. For nonzero e, all nodes are unfrozen, but 
the calculations proceed exactly as above with b' and P 
replaced by b'^ and Pg, respectively. 

We use a similar technique to estimate lim/f ^2+ based 
on Eq. (|A14[) . The main differences in this technique are 
that 1-input nodes do not enter the numerical sampling 
and the sum to be evaluated is two-dimensional rather 
than one-dimensional. 



III. RESULTS 
A. The large system limit 

Using the the expressions derived in Section III CI and 
the stochastic evaluation techniques described in Sec- 
tion III Dl we have obtained estimates of Zoo foi' K > 
2. Using the expressions in Appendix \^ we obtain 
lim/f_^2+ 2^00 ■ The results are shown in Fig. 

The solid line in Fig. [5] shows the full result for Too ■ 
The dashed line shows the contribution to loo that comes 
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h{B*) J, 




FIG. 3: Histograms h{B^) of the distributions of unfrozen lo- 
cal biases b drawn from for K 2+ (bold line), K = 3 
(medium line), and K = i (thin line). Bins of width 10"* 
were used to estimate the probability density from a sequence 
of 10^ sample vectors bt that were drawn after 10"^ steps for 
convergence. The size of the sample vectors is S = 10*. The 
combination of a small bin-width and a large sample size en- 
ables a clear picture of the strongest singularities. 



FIG. 4: The large system limit Too as a function of K for 
various noise levels e in the updating. The thin solid line 
shows Too for networks without noise as displayed in Fig. [2] 
The other lines represent e — 0.001 (thick solid line), 0.01 
(dashed line), and 0.1 (dotted line). The size of the sample 
vectors is S = lO". 10^-10^ were drawn after 10'^ steps taken 
for convergence in bt. Extensive sampling was required close 
to criticality for e — 0.001. 



from pairs of nodes that are directly linked in the net- 
work. It is interesting to note that the direct links alone 
are not responsible for the peak at criticality. Rather, it 
is the correlations between indirectly linked nodes that 
produce the effect, and in fact dominate loo for K at and 
slightly above the critical value. 

The distribution of local biases plays an important role 
in determining Zoo . Biases that are significantly different 
from b = 1/2 are important for K that are not deep 
into the disordered regime, and the distribution of local 
biases is highly nonuniform. Dense histograms of biases 
drawn from the distribution B* for various K are shown 
in Fig. [3l Singularities at 6 = and 6=1 occur for K 
in the range 2 < K < 3.4, and for all K > 2 there is a 
singularity at b — 1/2. 

When uncorrelated noise is added to each node at each 
time step, 2oo may decrease due to the random errors, 
but may also increase due to the unfreezing of nodes. 
The net effect as a function of K is shown in Fig. |4] for 
the case where each output is inverted with probability 
e on each time step. As e is increased from zero, the 
peak shifts to the disordered regime and broadens. The 
mutual information due to random unfreezing is clearly 
visible on the ordered side. In the regime where indirect 
contributions dominate Zoo, however, there is a strong 
decrease as correlations can no longer be maintained over 
long chains. Deep in the disordered regime, we see the 
sHght decrease expected due to the added randomness. 
For e > 0.1, the maximum of Iqo shifts back toward K = 
2. In fact, it can be shown that as e approaches 1/2, 



which corresponds to completely random updating, the 
Too curve approaches 

^oo = Q - cM-K/2) . (30) 

In this limit, the maximum occurs at A' = 2 and the peak 
height scales like (1/2 — e)^. The fact that the critical K 
is recovered in the strong noise limit is coincidental; it 
would not occur for most other choices of Boolean rule 
distributions. 



B. Finite size effects 

Numerical simulations on finite networks reveal an im- 
portant feature near the critical value of K that is not an- 
alytically accessible using the above techniques because 
of the difficulty of calculating loo right at the critical 
point. (We have only computed the limit as K ap- 
proaches Kc, not the actual value at Kc ) We compute 
(/) by sampling the mutual information from pairs of 
nodes from many networks. 

In collecting numerical results to compare to the loo 
calculation, there are some subtleties to consider. The 
calculations are based on correlations that persist at long 
times in the mean-field model. To observe these, one 
must disregard transient dynamics and also average over 
the dynamics of different attractors of each network. The 
latter average should be done by including data from 
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FIG. 5: Xjv as a function of K for several different system 
sizes. For these calculation we use 10'' networks with 40 runs 
from different initial states per network and a discarded tran- 
sient of length 10* updates for each run. (For large K, good 
convergence was obtained for discarded transients of length 
10^) The sequences of states were recorded for a sample of 
lOA'^ pairs of nodes in each network. The vertical dashed line 
indicates the critical value of K. 



FIG. 6: Tn as a function of p for several different system sizes. 
For these calculation we use 10'' networks with 40 runs from 
different initial states per network and a discarded transient 
of length 10''. (For some data points far into the disordered 
regime, good convergence was obtained for discarded tran- 
sients of length 10^.) The sequences of states were recorded 
for a sample of lOA'^ pairs of nodes in each network. The 
vertical dashed line indicates the critical value of p. 



all the attractors in the calculation of the mutual infor- 
mation, not by calculating separate mutual information 
calculated for individual attractors. For the results pre- 
sented here, we have observed satisfactory convergence 
both for increasing lengths of discarded transients and 
for increasing numbers of initial conditions per network. 
Finally, an accurate measurement of the mutual infor- 
mation requires sufficiently long observation times; short 
observation times lead to systematic overestimates of the 
mutual information. (See, for example, Ref. (Tsj.) In the 
figures below, the size of the spurious contribution due 
to finite observation times is smaller than the symbols on 
the graph. 

Fig.Oshows that the peak in In extends well above the 
computed loo value. The figure shows In as a function of 
K for several system sizes N . As N increases, the curve 
converges toward the infinite N value both in the ordered 
and disordered regimes. In the vicinity of the critical 
point, however, the situation is more complicated. The 
limiting value at criticality will likely depend on the order 
in which the large size and K — > limits are taken. 

We have also studied In as a function of the bias pa- 
rameter p, while holding K fixed at 4. Fig. [5] shows that 
In is again peaked at the critical point p = {2 — \/2)/4; 
the qualitative structure of the curves is the same as that 
for varying K. The calculation ofloo ior p 7^ 1/2 requires 
modifications of the analysis described above that are be- 
yond the scope of this work. 



IV. SPECIAL RULE DISTRIBUTIONS 

Up to now, the discussion has focused on rule distri- 
butions parametrized only by an independent probabil- 
ity p of finding a 1 in a given row of the truth table 
for any given node. Consideration of other possibilities 
shows that loo can actually be made as large as desired 
in networks that are as deep as desired in the disordered 
regime. Let A be the average sensitivity of a node to 
its inputs; i.e., the average number of nodes that change 
values when the value of one randomly selected node is 
fiipped. A = 1 is one criterion for identifying critical net- 
works [14]. For any value of A in the disordered regime 
(A > 1) and any target value I of Xqo, one can always 
define a rule distribution that gives a random network 
characterized by A and I. The key to constructing the 
distribution is the observation that long chains of single- 
input nodes produce large Xqo and that a small fraction 
of nodes with many inputs and maximally sensitive rules 
(multi-input versions of XOr) is enough to make A large. 

Consider the following class of random networks. Each 
node has an indegree k of either 1 or g, with the prob- 
ability of having g inputs being 7. The logic function 
at each node is the parity function or its negation. For 
k — 1 nodes this means they either copy or invert their 
input. (There are no nodes with constant outputs.) For 
k ~ g nodes it means that a change in any single input 
causes a change in the output. Note that there are no 
frozen nodes in these networks. 
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For these networks, we have 

A = (fc) = 1-7 + 57. (31) 

The network consists of 7iV nodes with muhiple inputs, 
which can be thought of as the roots of a tree of single- 
input nodes. If ^ 7iV, loops in the graph will be rare 
enough that they will have little effect on the average 
pairwise mutual information. If g and 7 are fixed and N 
is taken to infinity, loops can be neglected in computing 
Too- For a node with g > 1, the mutual information 
between any given input node and the output is zero for 
the rule distribution under consideration. This is because 
the bias distribution in networks consisting entirely of 
maximally sensitive nodes is a delta function at h = 1/2. 
Thus (/(P„))k = for all k ^ {1,1,..., 1}. For k = 
{1,1,..., 1}, Eq. HH]) gives = (1 - 7)2"+! and we get 
from Eq. (fT9|) : 



By choosing 7 <C 1 and 5 ^ I/7 we can make Too as 
large as desired while simultaneously making A as large 
as desired. 

Generalization of this construction to networks with a 
broader distribution of indegrees and/or rules is straight- 
forward. Roughly speaking, high Zoo occurs deep in the 
disordered regime when there is a small fraction of nodes 
of high indegree and high sensitivity and the remaining 
nodes are sensitive to exactly one input. 

V. CONCLUSIONS 

In the introduction above, we noted early evidence that 
eukaryotic cells may be dynamically critical. Our calcu- 
lations indicate that, within the class of RBNs with ran- 
domly assigned inputs to each node and typically stud- 
ied rule distributions ^ critical networks provide an opti- 
mal capacity for coordinating dynamical behaviors. This 
type of coordination requires the presence of substantial 
numbers of dynamical (unfrozen) nodes, the linking of 
those nodes in a manner that allows long-range prop- 
agation of information while limiting interference from 
multiple propagating signals, and a low error rate. To 
the extent that evolutionary fitness depends on such co- 
ordination and RBN models capture essential features 
of the organization of genetic regulatory networks, crit- 
ical networks are naturally favored. We conjecture that 
mutual information is optimized in critical networks for 
broader classes of networks that include power-law de- 
gree distributions and/or additional local structure such 
as clustering or over-representation of certain small mo- 
tifs. 

A key insight from our study is that the maximiza- 
tion of average pairwise mutual information is achieved in 
RBNs by allowing long chains of effectively single-input 
nodes to emerge from the background of frozen nodes 



and nodes with multiple unfrozen inputs. The correla- 
tions induced by these chains are reduced substantially 
when stochastic effects are included in the update rules, 
thus destroying the jump discontinuity in Too at the crit- 
ical point and shifting the curve toward the dashed one 
in Fig. [5] obtained from direct linkages only. Though the 
noise we have modeled here is rather strong, correspond- 
ing to a large fluctuation in the expression of a given 
gene from its nominally determined value, a shift of the 
maximum into the disordered regime may be expected to 
occur in other models. 

The behavior of the average pairwise mutual informa- 
tion in RBNs with flat rule distributions is nontrivial and 
somewhat surprising. This is due largely to the fact that 
the network of unfrozen nodes in nearly critical systems 
does indeed have long single-input chains. By choosing 
a rule distribution carefully, however, we can arrange to 
enhance the effect and produce arbitrarily high values of 
Too even deep in the disordered regime. Whether real 
biological systems have this option is less clear. The in- 
teractions between transcription factors and placement 
of binding sites required to produce logic with high sen- 
sitivity to many inputs appear difficult (though not im- 
possible) to realize with real molecules [15| . 

Maximization of pairwise mutual information may be 
a sensible proxy for maximization of fitness within an en- 
semble of evolutionarily accessible networks: we suggest 
that systems based on high-(/) networks can orchestrate 
complex, timed behaviors, possibly allowing robust per- 
formance of a wide spectrum of tasks. If so, the maxi- 
mization of pairwise mutual information within the space 
of networks accessible via genome evolution may play an 
important role in natural selection of real genetic net- 
works. We have found that maximization of pairwise mu- 
tual information can be achieved deep in the disordered 
regime by sufficiently nonuniform Boolean rule distribu- 
tions. However, in the absence of further knowledge, a 
roughly flat rule distribution remains the simplest choice, 
and in this case pairwise mutual information is maxi- 
mized for critical networks. Given the tentative evidence 
for criticality in real genetic regulatory networks [1, 0, [1] , 
these results may be biologically important. 
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APPENDIX A: APPROACHING CRITICALITY 

The mean-field calculations in Section III CI are not ap- 
plicable to critical networks, but we can investigate the 
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behavior for disordered networks that are close to criti- 
cahty. In this appendix, we investigate the limit 



lim Xn, 



K- 



(Al) 



for networks with the rule distribution R{K). 

Let K = 2 + e where e is a small positive number. The 
fraction of unfrozen nodes goes to zero as e 0, meaning 
that it is appropriate to expand Eq. (|2ip for small Ut- A 
second order Taylor expansion yields 



Ut+l ~ 2 

and the fixed point u satisfies 



1 



meaning that 



K-2 



2e. 



(A2) 



(A3) 



(A4) 



Approximation of Eq 

P{ku 

and 



to the same order gives 



P{ku = 2) 



(A5) 



(A6) 



The probability to obtain fcu > 2 vanishes to the first 
order in e. Equation (|A6|) yields that P(fcu = 2) « |u for 
small e. However, all rules with = 2 are not proper 2- 
input rules in the sense that they do not depend on both 
inputs. Of the 14 nonconstant Boolean 2-input rules, 4 
are dependent on only one input and are effectively 1- 
input rules. Hence, the probability p2 for an unfrozen 
node to have a proper 2-input rule is given by 



P2 



(A7) 



For small e, nodes with single inputs dominate the ex- 
pression for P„ in (|17p . A matrix T corresponding to a 
one-input rule is either the unity matrix or a permuta- 
tion matrix that converts Os to Is and vice versa in the 
probability distribution. None of these matrices has any 
effect on (/(P„)) or (/(P")), because of the symmetry 
between Os and Is in the rule distribution. Hence, we 
can express (/(P"))k on the form 



(/(P^'))k = (/(PK,„i)))k 



(A8) 



where tiq and n'^, respectively, are the numbers of in- 
degrees ki-^, . . . , ki^ and fej^ , . . . , fcj„^i that are different 
from 1 and k' is corresponding vector of indegrees differ- 
ent from 1. 

In the limit e — + 0+, we can neglect indegrees larger 
than 2. Hence, we introduce (/(P„(,,,ii )) to denote the 
average mutual information of 



>(2) 

(no.ni 



T T 

J- r). J- n 



(A9) 



where Ti, . . . , T„ and T'^, . . . , T^, correspond to ran- 
domly selected 2-input rules that do depend on both 
inputs. Both Bo and the T-matrices are drawn based 
on the distribution of local biases obtained by proper 2- 
input rules only, because the symmetry between Os and 
Is ensures that rules with one input do not alter the equi- 
librium distribution Bl. 



Then, Eq. (|20p can be approximated by 



n=Ono=Orii=0 ^ ^' ^ ^ 



X (2^2)"°+"^! -P2)^"+^"""""' 



(AlO) 



for small e. This approximation is exact in the limit 
e ^ 0+ and reordering of the summation gives 



limXoo= ^("o,ni)a(Pi'U)). (All) 



e-»0 



where 



("0,"l) 



lim.Vf"V"+^ 



X (2p2)"''+"ni -P2)^"+^""°""' 

no 



lim 



(A12) 
(A13) 



Because limc_o+ u/{2p2) = 2/5, we get 



For approaching the critical point from the ordered 
regime, we know from the discussion in Section [I] that 



lim Zoo = , 



(A15) 



meaning that Xqo has a discontinuity at K = 2. From the 
scaling of the number of unfrozen nodes and the number 
of relevant nodes, we expect that Zoo is well-defined and 
different from for K ^ 2 but we have found no ana- 
lytical hints about whether this value is larger or smaller 
than lim/f^2+ loo- 

Numerical evaluation of the sum in Eq. (|A14p is car- 
ried out in close analogy with the technique described in 
Section HlDl 



11 



[1] B. Samuelsson and J. E. S. Socolar, Phys. Rev. E 74, 
036113 (2006). 

[2] B. Derrida and Y. Pomeau, Europhys. Lett. 1, 45 (1986). 

[3] M. Aldana-Gonzalez, S. Coppersmith, and L. P. 
KadanofT, in Perspectives and Problems in Nonlinear Sci- 
ence, edited by E. Kaplan, J. E. Marsdcn and K. R. 
Sreenivasan (Springer, New York, 2003), p. 23. 

[4] S. A. KaufTman, Physica D 42, 135 (1990). 

[5] D. Stauffer, J. Stat. Phys. 74, 1293 (1994). 

[6] I. Shmulevich, S. A. Kauffman, and M. Aldana, Proc. 
Natl. Acad. Sci. USA 102, 13439 (2005). 

[7] P. Ramo, J. Kesseli, and O. Yh-Harja, J. Theor. Biol. 
242, 164 (2006). 

[8] R. Serra, M. Villani, and A. Semeria, J. Theor. Biol. 227, 
149 (2004). 



[9] A. S. Ribeiro, R. A. Este, J. Lloyd-Price, and S. A. Kauff- 
man, WSEAS Trans, on Systems 5, 2935 (2006). 
[10] P. Krawitz and I. Shmulevich, Phys. Rev. Lett. 98, 
158701 (2007). 

[11] B. Luque and A. Ferrera, Complex Systems 12, 241 
(2000). 

[12] N. Bcrtschinger and T. Natschlager, Neural Comput. 16, 
1413 (2004). 

[13] F. Bazso, L. Zalanyi, and A. Petroczi, Proc. IEEE Intl. 

Joint Conf. on Neural Net. 4, 2843 (2004). 
[14] I. Shmulevich and S. A. Kauffman, Phys. Rev. Lett. 93, 

048701 (2004). 

[15] N. E. Buchler, U. Gerland, and T. Hwa, Proc. Natl. Acad. 
Sci. USA 100, 5136 (2003). 



